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Abstract 



A standard approach for comparing biological strategies is to examine the mean and 
variance in reproductive success. These values rely on measures of the first two moments 
of the offspring distribution. Here we discuss an alternative, comparing strategies by 
their probability of extinction. We focus on the interplay between extinction and the 
moments of the offspring distribution. The probability of extinction decreases with in- 
creasing odd moments and increases with increasing even moments, a property which is 
intuitively clear. There is no closed form solution to calculate the probability of extinc- 
tion in general, and numerical methods are often used to infer its value. Alternatively, 
one can use analytical approaches to generate bounds on the extinction probability. We 
discuss these bounds, focusing on the theory of s-convex ordering of random variables, 
a method primarily used in the field of actuarial sciences. This method can be used to 
generate "worst case scenario" distributions using the first few moments of the offspring 
distribution, which then lead to upper bounds on the probability of extinction. 



1 Extinction of a branching process 



Survival is ultimately tied to population growth; life avoids extinction through replication. 
Populations with rapid growth will often avoid extinction. However, a population may have 



a high expected growth rate but nevertheless go extinct with near certainty (Lewontin and 



Cohen, 1969). For example, populations with large variation in reproductive success can 



sometimes have a high probability of extinction, even if they have a high expected rate of 
growth (Tuljapurkar and Orzack, 1980). 
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Similarly, investors and gamblers can avoid gambler's ruin through growth of capital. How- 
ever, a gambler cannot simply apply the strategy with the highest expected growth rate 
because this may run a high risk of ruin. For example, investors can use the Kelly ratio 



(Kelly, 1956) to maximize expected geometric growth of their capital but strict adherence 



to this ratio can be risky, and playing a more conservative strategy is often recommended 



(MacLean et al. 2010). 



To estimate the probability of gambler's ruin, one can use approximations based on moments 



(Ethier and Khoshnevisan, 2002; Canjar, 2007; Hiirlimann, 2005). Moment based estimations 



of gambler's ruin have been developed extensively in the field of actuarial science (Denuit and 



Lefevre, 1997, Denuit et al. 1999, Hiirlimann 2005, Courtois et al. 2006). Here we apply 



these approaches to branching processes. The mathematics of gambler's ruin is very similar 



to that of extinction in a branching process (Courtois et al. 2006). Both statistical models 



involve a random variable (payoff /off spring number), resulting in a random walk (change 
in capital/change in population size), and an absorbing state (ruin/extinction). Moreover, 
both processes are assumed to be Markovian, and finding the probability of ruin/extinction 
involves solving for the root of a convex function. 

To investigate biological extinction, we use a Galton- Watson branching process, in which, 
at each discrete time interval, each individual generates i discrete offspring with probability 
Pi, and zero offspring with p . Without loss of generality we assume that an individual 
produces its offspring and then dies, i.e. each individual in a population is restricted to a 
single generation. The offspring number is a random variable, which we denote by X. Let n 
be the maximum value of X, and thus X takes values in the state space T> n = {0, 1, 2, n}. 

The state space of T> n and its moments are strictly positive. This naturally leads to the 
use of s-convex extremal random variables to obtain bounds the probability of extinction. 



(Courtois et al. , 2006 Denuit and Lefevre, 1997 Denuit et al. 1999; Hiirlimann, 2005). Here 



we discuss how the moments of the offspring distribution are related to the probability of 
extinction, expanding on the work by Courtois et al. (2006). We show how the first few 



moments of X can be used to generate extremal random variables representing "worst case 
scenarios". The extremal random variables examined here provide upper bounds to the 
probability of extinction. 



2 Extinction in the Galton- Watson branching process 

At any given time t, the size of a population (Z t ) is the number of individuals in the branching 
process. We set Z Q — 1 unless otherwise specified. The probability of extinction of a 
branching process is q = lim^oo P(Z t = 0\Z = 1). 



The recursive formula for finding q can be found through a first step analysis (Kimmel and 



Axelrod, 2002). The probability that the lineage of a single individual expires is then the 
probability that it dies without offspring (p ) plus the probability that it produces a single 
offspring whose lineage dies out (piq) plus the probability that it produces two offspring 
whose joint lineages die out (p2q 2 ), and so on. 
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This leads to the formal definition of the probability generating function: 

n 

f(q) = E[q x ] = p + Pl q + p 2 q 2 + p 3 q 3 . . . p n q n = ^p k q k . (1) 

k=0 

The probability of extinction of a branching process starting with a single individual is the 
smallest root of the equation f(q) = q for q G [0,1]. 

If the population starts with more than one individual, Zq = N with N > 1, and the 
generating functions for each individual are independent, then 



lim P(Z t = 0\Z = N) = q 



A' 



Therefore, we solve for the case Zq — 1 with the understanding that q can then be used to 
find extinction probabilities for any starting population size. 

The solution q — 1 is always a root of (II]) and is not necessarily the least positive root. In 
some cases, the probability of extinction is trivially obvious. For instance, if po = 0, i.e. 
an individual always produces at least one offspring, then q = 0. Furthermore, cases where 



ELY] < 1 always yield q = 1 (Kimmel and Axelrod 2002). 



Analytically solving for the probability of extinction for branching processes with po > 
and ELY] > 1 can be difficult because Q has n complex- valued roots according to the 
fundamental law of algebra. In the following we point out how ([TJ can be seen in terms of 
moments of the offspring distribution, and discuss how this can be used to estimate q. 



3 Moments of the branching process 

Let rrik = E[Y fc ] denote the kth moment of the branching process generator Y. The first 
moment, mi, is equivalent to the average offspring number. Higher moments can be used to 
obtain other summary statistics of the distribution, such as the variance a 2 = m 2 — m\. 

The Laplace transform of ([lj can be used to (recursively) express extinction in terms of the 
moments of the branching process (and itself) 

/(?) = IE [q x ] = E [e x logq ] (2) 

i.i , Og?) 2 . (log*?) 3 

= 1 + mi logg + m 2 h m 3 h . . . 

2 o 

where mo = 1. Note that m^ > 1 and because logg < the signs of each term alternate. 
Therefore, even moments increase the probability of extinction while odd moments decrease 
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it. Additionally, if q G (e 1 , 1) then logg G (—1,0) and the series converges with logg. So 
approximations, f*(q), which take the form 

f(q) = J2m k ^t + o((\ogqy) (4) 

k=0 

for an s > 3 are only accurate when q is large and the moments are small. As q j. 0, the 
approximation requires more and more terms to be accurate. Therefore, when q is small the 
first few moments are not necessarily informative about extinction. 



4 Estimating extinction 



Gambling literature investigates an alternative equation to (Oj) (Ethier and Khoshnevisan 
2002; Canjar, 2007). Here, one divides by q on both sides of (fij), and a simple rearrangement 
results in: 

= E [q x - 1 ] - 1 (5) 

We must introduce new notation to simplify. Define the modified random variable X = X — 1 
and its moments m& = E [(X — l) fc ] . This variable represents the change in population size 
between generations, equivalent to the return on a gamble (total win or loss). 



Zt+i — Z t + Xj 
i=i 



Feller 



(1968) in which X represents a random walk, the 



This is also the approach used in 
generating function is defined as f(q) = E[g x ], and the probability of extinction is the 
smallest positive root of f(q) = 1. Using the Laplace transform of the new equation we 
obtain: 



=E[q*] 



E 



k=l 



(log?) 

k\ 



k=0 

k 



(l og qf 
k\ 



(6) 



If q is assumed to be near 1, then approximations similar to Q can be made. For example, 
if two moments are known: 

(log q\^ 

= rhi \ogq + m 2 — h o((logg) 3 ) 



q ~ exp 



2rh\ 
rh 2 



If higher moments are known, or can be estimated (e.g. Gonzalez et al. , 2008) the extinction 



estimate can be improved by including more terms. However, adding only the first three 
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moments results in a cubic that has no real root on q G (0, 1) if mirh 3 > 3(m 2 ) 2 /8 (Ethier 



and Khoshnevisan 2002). Alternatively, truncating after any even moment always provides 
an equation with a real root on q G (0, 1), but as previously noted, these approximations are 
only meaningful if q is reasonably large. 

The advantage of this approach is that finding the root of ^ does not involve dividing log q 
by q, as would be required for Q. This produces a simple approximation for the probability 
of extinction based on two moments. However, improved bounds are possible using our 



original variable X and its moments (Courtois et al. 2006, Hurlimann, 2005). 



5 s-Convex orderings of random variables 



The benefits of using X instead of X is that it conveniently allows for s-convex ordering 



(Courtois et al. 2006, Denuit and Lefevre, 1997; Hurlimann, 2005). Define the moment 



space for all random variables with state set T> n and fixed first s — 1 moments mi, . . . ,m s _ 1 
by 



Our random variable X cannot take negative values so our moment space contains only 
positive elements. 

For two random variables X and Y with state set D n , we say that X is smaller that Y in 
the s-convex sense (X <f" cx Y) if and only if 



E(X k ) = E(Y k ) 
E(X k ) < E(Y k ) 

Minimum and maximum extrema distributions on 23^ri can be found for any distribution on 



for k — 1, 2, s — 1 
for k > s 



T> n , with fixed first moments m 1; m 2 , ...,m s (Denuit and Lefevre 



that use Vandermonde determinants (Denuit and Lefevre, 1997 



1997) derived with methods 



Hurlimann , 2005 ) or the cut 



criterion (Courtois et al.[ 2006|). The random variables for these extremal distributions are 

?" X <?» All for all X G V n 



denoted X^ m and Xmlx such that 



'"min 



(7) 



Following Denuit and Lefevre (1997) and Hurlimann (2005) we can now define the extremal 



min/max random variables given the first few moments. We begin on whh the maximal 
random variable, Xmlx, defined as: 



X 



(2) 



with p 
n with p n 



mi 
n 



rrii 
n 



The study of the moment problem (e.g., Karlin and McGregor, 1957, Prekopa, 1990) yields 



an important relationship between consecutive moments on 537n conditional on mi > 1 



i+i 



(mi) * < m i+ i < nrrii 



(9) 
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And for X ma x, m»+i = nrrij, so by p| this can clearly be seen as the maximum extrema. 
Intuitively, this is the "long shot" distribution on T> n , a worst case scenario. Because the 
values and respective probabilities of Xmax are known, q can be solved explicitly using Q. 
This provides an upper limit on extinction because the generating function for the extremal 
random variable will be greater than or equal to the generating function for all other random 
variables with the same mi and n, on q G [0, 1]. 

23™ n is a very general moment space, as the first moment alone does not provide much 

(2) 

information about the distribution. Therefore, X m a X is not likely to be a tight upper bound, 
especially when n is large or unknown. However, in cases where mi is near n the distribution 

(2) 

can be fairly well approximated by X m a X . 

(2) (2) (2) 

Unlike X max , X mm does not provide a useful bound on the probability of extinction. X mi ; n is 
defined as: 

x V = it withp^ + 1-^ 
K + 1 with p£ +1 = mi - £ 

where £ is the integer on T> n such that 

£ < m 1 < £ + 1 



This extrema has po = and therefore q = because rri\ > 1. 



All of the extrema examined here can be found using discrete Chebyshev systems (Denuit 



and Lefevre, 1997). However, extremal bounds are perhaps more intuitive for continuous 



random variables, to which the discrete cases can be seen as similar flC ourtois et al. 
Shaked and Shanthikumar 



2007 



Hiirlimann 



in the continuos case has only one possible va 



2005 



Denuit et al. 



because (m, 
properties. 



\(t+i)/i 



m i+1 



(mi 



2006 



1999). For example, X. 



(2) 



ue, p mi = 1. By ([9) this is clearly an extrema 
In comparison, the discrete case (10) has similar 



If the first two moments are known, then a better upper limit can be found. On 23^ n the 
minimal distribution in the 3-convex sense is given by: 



v-(3) 
min 



with po 
with p^ 



£ + 1 with p 5+ i 



(£ + l)mi - m 2 
m 2 — £mi 



'111 



£ + 1 



where 



m 2 

£< — <£ + !■ 

mi 



This bound is already known in the branching process literature (Daley and Narayan, 1980). 

(2) (3) 

Similar to A ma x, the extremal random variable A min represents a worst case scenario, this 
time using two moments. The root of the equation 



/(?) = <? = Po + + 



(12) 
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provides an upper bound to the probability of extinction due to the s-convexity of 23™„, i.e. 

than the probability generating functions of any other 

(2) 

In contrast to A m a X , the minimum extrema yields the upper limit 
in 23 ™ n . The alternation between minimum and maximum for the worst case scenarios is 



(12) has greater values on q G [0, 1 
random variable in 23 



'3,n' 



due to the convexity of Q. 



(2) 

As with Al in , this extrema is perhaps more intuitive in the continuous sense, in which: 



A 



(3) cont. 



with p = 1 



1)12 

mi 



with p m2 / mi 



Pni2/mi 

m 2 



(13) 



In this case, successive moments simply grow by i.e. TOj + i = m^), providing a clear 
minimum on 23 ™ n . And, as was the case for the minimum on 53 ™ n , the discrete minimum 
extrema on 23™ n has similar properties to the continuous minimum extrema. 

For both 23™ n and 23™ n the discrete cases are simply discretization of the continuous case. 



Importantly, this is not necessarily the case for higher moment spaces (Courtois et al. , 2006). 



So, while the continuous cases provide more intuitive extrema, derivation of the discrete case 
for higher moments is not as simple as deriving the continuous case and discretizing. 



Next, we examine the maximum extrema on 53 ™ n : 



All 



with p^ 



(£ + l)(n — mi) + ni2 — nm\ 



£ + 1 with pg+i = 
with p n = 1 



n — £ 
(n + m i ~~ m 2 — n £, 



(14) 



n 



n-i- 1 



where 



rami — mo 

£< <£ + l 

n — mi 



Notice that if nm\ — m 2 > n — m 1; p = and q = 0, and this variable fails to provide a 
helpful bound on extinction. Otherwise, £ = and a lower bound on the probability can be 
obtained. 



Except under certain criteria, all Xmaf and X^™ IL) have p = (Hurlimann 



(even) 



2005 



Denuit 



and Lefevre, 1997), and therefore s-convex ordering does not provide general lower limits 
for extinction of branching processes, other than the obvious bound q > 0. For the sake of 
brevity these extremal distributions will be ignored for higher moments. We instead focus 
on extrema that provide upper bounds to extinction. 



The use of three moments can improve bounds on the probability of extinction, but as with 



(2) 

Xmax, the maximal random variable, Amax requires the knowledge of the maximum, n. X t 



-(4) 



(-1) 
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is defined as: 



with po = 1 - p£ - Pz+i - p n 
£ with p^ = ' 



£ + 1 with p^ +1 



n 



with p n 



e(n-e) 

+ n) — nm\^ — m 3 

m 3 -m 2 (2£ + 1) +mx£(£ + 1) 
n(n - i){n - £ - 1) 



(15) 



where 



m 2 77 — m-i 

i < — < £ + 1 

rriin — m 2 



(3) 

While this is a potential improvement to the bound given by , the improvement is 
sometimes negligible. As n — > oo, the difference between Xmax and X^ n vanishes because 



m 2 n — rri3 m 2 
lim = — 

n^oo rri\n — m 2 m\ 



and because p n — >■ 0, the generating function for Xmax is identical to (1 12h . So, like the first 



moment, the third moment is generally uninformative about extinction when n is unknown, 
unless assumptions are made about the distribution (eg. see Daley and Narayan (1980); 



Ethier and Khoshnevisan (2002)) 



(5) 

If the first four moments are known, the extremal variable X ^ - n can be obtained. Its distri- 
bution takes a simple form, but the equations used to find its values a nd relative probabilities 
are relatively large. Transforming the notation of 



Hurlimann 



(120051), X^ n is defined as: 



X 



(5) 





V 

1*7 + 1 



with po = 1 - pt - p(. + i -p v - p v +i 

-(ri+l)(m2ri-m 3 )+(m 3 n-m4) + ((+l)(('q+l)(rn 1 ri-m2)-(m2ri-rn 3 )) 



with p^ = 
with p^ + i 
with p v = 
with p v +i 



*fa-0(»j+i-O 

(fj+l)(m 2 t;— m3)~(m3i;— m 4 )-g((?y+l)(mi7;-m.2) — (m 2 ??— m, 3 )) 

ce+i)(»i-o(»H-i-«) 

-(g+l)(m 2 g-m 3 )+(m3g~m4)+(??+l)((g+l)(mig-m2)-(m2g-m3)) 

e(ij-f)fa-f-i) 

(g+l)(m2g-m 3 )-(m 3 g-m4)-t;((g+l)(m 1 g--m2)-(m2g-m3)) 
(r,-0(r?+l-0 



(16) 



where 



< m 4 - 2(77 + l)m 3 + 77(77 + l)m 2 < + 1 
m 3 — 2(77 + l)m 2 + 77(77 + 1)771! ~~ 



m 4 -2(£ + l)m 3 + £(£ + l)m 2 . , , 

77 < < 7? + 1 

' m 3 - 2(f + l)m 2 + £(£ + l) mi ~ 1 



X^ n is a more accurate bound than Xmax, and does not require knowledge of n. As we will 
show in our examples, this extrema can often provide a tight upper bound. 



'(4) 
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6 Examples 



Here we discuss some example distributions, graph their generating functions, and also graph 
generating functions for the extremal distributions. The plot of the probability generating 
function, f(q), on q G (0,1) is a useful way to visualize how the moments are related to 
extinction. f(q) takes the value po at q = 0. At small q, f(q) has a slope of approximately 
P\. In this part of the function, when q is small, there is little relationship between f(q) and 
moments. 

The moments are closely related to f(q) at the other end of our domain of interest, when 
q is near 1. For example /'(l) = mi. Higher moments begin to influence the function as q 
moves away from 1. 



UUUDd 



17 18 19 20 



9 10 11 12 13 14 15 16 17 18 19 20 




(a) Binomial distribution Bm 20l o.i (b) Truncated geometric distribution 

Figure 1: Probability generating functions for a Binomial and a truncated Geometric distri- 
bution. Both distributions have a mean of 2. 



In our first examples, we compare two distributions with an identical first moment and max- 
imum (mi = 2, n = 20) for a binomial distribution and a truncated geometric distribution 
(Figure 1). In these example, Xmax clearly does not provide a good bound. X^ - n , on the 
other hand, provides a very tight upper bound in both cases. For the binomial distribution 
(Figure 1(a)), the use of three moments provides only minimal improvement over the use 
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of just two moments. In this case, the odd moments and maximum of the distribution are 
not highly informative. In comparison, the odd moments are more helpful in the truncated 
geometric distribution (Figure 1(b)). 

For our final example, we have chosen a distribution that demonstrates the utility of exam- 
ining moments of the offspring distribution. Figure 2 displays a distribution that is almost 

(2) 

identical to the long shot, Xmax- This can be clearly seen by examining its distribution, or 
by examining the plot comparing the two probability generating functions. Here we do not 
plot the extrema for higher moments because they only provide trivial improvements to the 
bound. 

Figure 2: 



Jn 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 




X(2)max 
f(q) 



Importantly, this example distribution can be seen as nearly identical to Xmax by simply 
examining the relationship between the first few moments. The moments for this example 
grow rapidly, increasing by a factor that is nearly equal to the maximum of the distribution. 
In cases such as this, knowledge of the moments and the maximum of the distribution provide 
more than just upper bounds on extinction, they provide sufficient information to estimate 
the distribution itself. 
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7 Summary and Conclusions 



The work here is intended to highlight the relationship between the moments of the offspring 
distribution with the probability of extinction. The extinction equation can be defined in 
terms of moments, but the first few moments are only closely related to extinction when q 
is reasonably large. But, no matter the value of q, there exists an interesting relationship 
with even and odd moments: high even moments favor extinction, high odd moments favor 
survival. This relationship between even and odd moments is also seen in a stochastic version 
of the Price equation, where relative rates of growth increase with increasing odd moments, 



and decrease with increasing even moments (Rice, 2008). 



Approximations and bounds on extinction can be made if the first few moments are know. 
Strict upper bounds can be found by examining s-convex extremal random variables. Using 
an even number of moments provides the most useful bounds because these bounds do not 
rely on the maximum value of the offspring distribution. If an odd number of moments 
are known, s-convex approximations can provide improvements over the use of fewer even 
number of moments. However, these bounds require knowledge of the maximum and are of 
limited use when the maximum is large. 

And finally, the relationship between moments and the probability of extinction can provide 
insights into the evolutionary process. Evolution cannot simply favor high expected rates 
of growth because strategies with a high first moment can also have a high probability 
of extinction. Evolution also does not simply always favor strategies with high expected 
growth and low variance. Rather, strategies with large odd moments, and relatively small 
even moments are favored for survival. The first moment has the strongest influence on 
survival, and the influence decreases for each successive moment. The relative importance 
of higher moments depends on the distribution, and in some cases, higher moments should 
not be ignored. A similar conclusion about the evolutionary process can be drawn from the 



stochastic Price equation (Rice, 2008). 
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